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Venezuelan equine encephalitis virus (VEEV), a member 
of the membrane-containing Alphavirus genus, is a human 
and equine pathogen, and has been developed as a bio- 
logical weapon. Using electron cryo-microscopy (cryo- 
EM), we determined the structure of an attenuated vaccine 
strain, TC-83, of VEEV to 4.4 A resolution. Our density map 
clearly resolves regions (including El, E2 transmembrane 
helices and cytoplasmic tails) that were missing in the 
crystal structures of domains of alphavirus subunits. 
These new features are implicated in the fusion, assembly 
and budding processes of alphaviruses. Furthermore, our 
map reveals the unexpected E3 protein, which is cleaved 
and generally thought to be absent in the mature VEEV. 
Our structural results suggest a mechanism for the initial 
stage of nucleocapsid core formation, and shed light on the 
virulence attenuation, host recognition and neutralizing 
activities of VEEV and other alphavirus pathogens. 
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Introduction 

Venezuelan equine encephalitis virus (VEEV) is a mosquito- 
borne viral pathogen that has caused periodic, extensive 
outbreaks of human and equine diseases throughout 
the Americas, including in Texas (Weaver et al, 2004). 
Epidemics emerge when enzootic VEEV strains, which circu- 
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late among rodents in sylvatic or swamp habitats, acquire 
mutations that alter their host range to mediate equine 
amplification and transmission by mosquitos which have 
more promiscuous host preferences. In several countries 
including the United States, VEEV has been developed as a 
biological weapon (Bronze et al, 2002) . Consequently, VEEV 
is classified as an NIAID (National Institute of Allergy and 
Infectious Diseases) Category B priority pathogen. Despite its 
threat to the public, no human vaccines or antiviral drugs 
thus far have been licensed. The attenuated TC-83 strain 
(Berge et al, 1961) is one of the few experimental vaccines 
that have been used to protect laboratory workers and 
military personnel. It was derived by 83 passages of the 
wild-type Trinidad donkey strain in guinea pig heart cells, 
which resulted in 12 nucleotide substitutions, 8 of which are 
non-synonymous (Kinney et al, 1993). 

VEEV is one of the species in the Alphavirus genus of 
Togaviridae (Strauss and Strauss, 1994), a family of mem- 
brane-containing single-stranded RNA viruses, which also 
includes Sindbis (SINV), Semliki Forest (SFV), Ross River 
(RRV) and Chikungunya (CHIKV) viruses. To date, crystal 
structures have been obtained for domains of the three 
major alphavirus structural proteins: the C-terminal protease 
domain of the capsid protein (CP; Choi et al, 1991, 1997) and 
the elongated ectodomains of the envelope glycoproteins El 
(Lescar et al, 2001; Gibbons et al, 2004; Roussel et al, 2006; 
Li et al, 2010; Voss et al, 2010) and E2 (Li et al, 2010; Voss 
et al, 2010). However, the crystal structures of El and E2 
endodomains are still missing. (The term 'ectodomain' refers 
to the portion of the protein outside the membrane, while 
'endodomains' refers to the regions within or inside the 
membrane.) In addition, electron cryo-microscopy (cryo- 
EM) has been used to determine the entire structures of 
various alphaviruses and their mutants, but limited to mod- 
erate resolution (Mancini et al, 2000; Pletnev et al, 2001; 
Zhang et al, 2002, 2005; Mukhopadhyay et al, 2006; Sherman 
and Weaver, 2010; Kostyuchenko et al, 2011). These struc- 
tures have not only revealed two nested shells (an outer 
glycoprotein shell and an inner nucleocapsid shell) with the 
same T = 4 icosahedral symmetry, but also have localized 
several glycosylation and antibody binding sites on their 
surfaces. 

Alphaviruses enter the host cells by receptor-mediated 
endocytosis (Garoff et al, 1994). Acidification in the endo- 
some triggers the structural rearrangement of the outer 
glycoprotein shell (Wahlberg and Garoff, 1992), which drives 
the fusion between the viral envelope and the endosomal 
membranes. The membrane fusion followed by the dis- 
assembly of nucleocapsid cores with ribosomal subunits 
(Wengler et al, 1992) allows the viral genome (~11.5kb) to 
be released into the host cytoplasm. Following the translation 
of the non-structural polyprotein from the genomic RNA and 
synthesis of negative strand RNA, a subgenomic 26S RNA is 
then transcribed and translated into a single structural poly- 
peptide C-p62-6K-El (Rice and Strauss, 1981), which is 
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subsequently cleaved into several structural proteins. The CP 
is autoproteolytically processed, and 240 CPs encapsidate one 
copy of viral RNA genome to form the nascent nucleocapsid 
core. The remaining polypeptide is translocated to the en- 
doplasmic reticulum (ER) membrane (Liljestrom and Garoff, 
1991), where it undergoes cotranslational cleavage to yield 
pE2, 6K and El proteins. The pE2 and El proteins form 
heterodimers (Barth et al, 1995; Andersson et al, 1997) in 
ER and are transported to the Golgi complex, where they form 
trimers of pE2-El heterodimers (Mulvey and Brown, 1996). 
In Golgi, the small E3 protein is cleaved from pE2 to yield the 
mature E2 protein (Jain et al, 1991; Salminen et al, 1992). 
Finally, the trimers of E1-E2 heterodimers are transported to 
the plasma membrane, where they interact with the nascent 
nucleocapsid cores in the cytoplasm to form the intact 
progeny viruses that bud out of the host cell for the next 
round of infection. 

Here, we report an all-atom model of the entire VEEV 
derived from our 4.4 A resolution cryo-EM density map. Our 
model covers the full-length El and E2 glycoproteins (includ- 
ing the ectodomain, stem region, transmembrane (TM) helix 
and C-terminal tail), the E3 protein, and all the structurally 
ordered portion of CP (including the protease domain and 
one additional helix). Our map and model provide new 



insights on how different components of an alphavirus 
interact to self-assemble into an infectious viral particle. 



Results and discussion 

3D reconstruction and averaging subunits within an 
asymmetric unit 

A cryo-EM structure of VEEV (TC-83 strain) was obtained 
initially at 4.8 A resolution from ~ 37 000 virus particle 
images recorded in a 300-keV electron cryo-microscope 
(Figure 1A). The reconstructed map (Figure IB) shows 80 
trimeric spikes on its surface, each containing three copies 
of El (48 kDa), E2 (47 kDa) and E3 (7 kDa) proteins. Three El 
molecules form the edges of a triangle and surround an E2 
homotrimer, which protrudes outward forming the tip of 
the spike. A slice through the 3D density map (Figure 1C) 
provides a clear view of the viral membrane and E1/E2 TM 
helices, which extend from the outer glycoprotein shell to the 
inner nucleocapsid shell. Directly below the nucleocapsid 
shell is a layer of relatively disordered density that corre- 
sponds to a mixture of CP and genomic RNA. Interior 
to this CP/RNA shell, two additional shells are present: a 
low-density shell and a dense core. 




Radius (A) 
200 270 340 





' Lipid bilayer 



Figure 1 3D reconstruction of VEEV. (A) A typical CCD image of VEEV TC-83 strain embedded in vitreous ice. Scale bar: 50 nm. (B) Radially 
coloured 3D reconstruction of VEEV, showing the El basal triangle (green) and E2 central protrusion (blue) for each spike. Scale bar: 10 nm. 
(C) A slice through the 3D density map 20 pixels from the origin. The insert is the ID radial density profile of the map and is aligned to the 
slice image. (D) One asymmetric unit of the virus containing four unique copies of El (magenta), E2 (cyan), E3 (orange) and CP (blue). 
The cryo-EM densities for the viral membrane (yellow) and genomic RNA (green) are also displayed at slightly lower isosurface threshold. 
Scale bar: 2 nm. 
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Figure 2 El and E2 ectodomains. (A) Our models for VEEV El (magenta) and E2 (cyan). The homology model parts and de novo model parts 
are shown as ribbon and stick, respectively. The asymmetric unit averaged map is shown in transparent grey. The de novo modelled part of El 
stem region (residues 390-402) is coloured in yellow. Subdomains of El (I, II and III) and E2 (A, B, C and D) are labelled in black circle, 
following the previous definition (Lescar et al, 2001; Voss et al, 2010). Scale bar: 2nm. (B) The separation of p-strands at E2 subdomain C, 
displayed at slight higher isosurface threshold. It also shows the protrusion density for glycan at E2-318 and the annotated atomic structure of 
N-acetylglucosamine (NAG). (C) A 180° rotation of (A) shows the El stem region wraps around El subdomain III. The blue dashed arrow 
points to a small, unidentified density. (D) The E2 subdomain D. (E) The protrusion density for the glycan at E1-N134 and the annotated NAG. 
(F) The El fusion loop (orange) which sits between E2 A and B subdomains. 



To further improve the resolvability of various structural 
features, we performed an additional averaging (Zhou et al, 
2000; Zhang et al, 2008, 2010; Wolf et al, 2010) of the four 
unique sets of E1-E2-E3-CP molecules within one asymmetric 
unit of the virus (Figure ID), resulting in an averaged map at 
4.4 A resolution (Supplementary Figure SI). This averaged 
map substantially improves the density connectivity for the 
loop regions and the TM helices (Figures 2A and 5A). In 
addition, the p-strands are better resolved, for example, in the 
E2 ectodomain (Figure 2B) . It is noteworthy that the averaged 
density appears smoother than the original map, with a 
significant reduction of noise and the disappearance of 
some putative side-chain densities. However, density bulges 
corresponding to many of the bulky side chains (F, W and Y) 
remain visible in the averaged map. As a result, we primarily 
used this averaged density map for the subsequent structural 
modelling and interpretation. 

E1 and E2 ectodomains 

We built the full-length VEEV El and E2 models using a 
combination of homology modelling and de novo modelling. 
A majority of the El, E2 ectodomains (El: subdomains DI, DII, 
Dili; E2: subdomains A, B, C) were constructed by homology 
modelling; while part of the El stem loop (residues 390-402, 
Figure 2 A and C), a previously unidentified E2 subdomain D 
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(residues 342-367, Figure 2D), and the entire El, E2 endodo- 
mains were built de novo. Subsequently, the homology part 
and the de novo part were stitched together and refined against 
cryo-EM density to produce our final El and E2 atomic models 
(see Materials and methods). Since the side-chain conforma- 
tions (rotamers) except those bulky ones with evident protrud- 
ing densities in our final model are not fully restrained by the 
cryo-EM densities, their accuracies are limited, as with any 
crystal structure determined at a similar resolution. 

To compare the differences between our final model of 
VEEV and the CHIKV crystal structure, we fit the CHIKV El 
and E2 ectodomains separately into our asymmetric unit 
averaged map of VEEV as a rigid body and then calculated 
the individual root-mean-square-deviation (RMSD) per Coc 
atom for El and E2 between the fitted CHIKV model and 
our VEEV model. The results (Supplementary Figure S2) 
show that the E2 ectodomain (residues 1-341) has more 
variations (RMSD 4.2 A) than El ectodomain (residues 
1-391) (RMSD 1.8 A), with the largest deviations mapped to 
not only the loop regions but also some of the p-strands. 
Similarly, we also computed the RMSD between our VEEV 
model and the fitted SINV E1-E2 crystal structure at low pH 
(PDB ID: 3MUU, chain A) (Li et al, 2010), in which 
subdomain B at E2 ectodomain is not resolved. Our results 
show an RMSD of 2.4 A for E2 ectodomain and 2.9 A for El 
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ectodomain, with the most variations mapped to the El 
fusion loop region (Supplementary Figure S3). 

The number and positions of the N-linked glycosyl- 
ation sites among the glycoproteins in alphaviruses are not 
absolutely conserved (Strauss and Strauss, 1994), and the 
locations and types of glycans in VEEV have not been well 
characterized. The N-linked glycosylation sites are generally 
identified by an Asn-X-Thr/Ser motif, where X is any amino 
acid except Proline (Gavel and von Heijne, 1990). Based on 
this criterion, only two residues (E1-N134 and E2-N318) 
are likely glycosylated in VEEV. In our cryo-EM map, we 
indeed observe prominent protruding densities at both sites, 
either of which can accommodate only a monosaccharide (N- 
acetylglucosamine) rather than a disaccharide (Figure 2B and 
E). It is noteworthy that the El glycan is surface exposed, 
while the E2 glycan is buried near the outer lipid membrane. 

Our map and models also reveal several features that are 
implicated in the alphavirus fusion process in the endosome. 
In our density map of mature VEEV at neutral pH condition, 
the El fusion loop is clearly visualized at the cleft between E2 
subdomains A and B (Figure 2F), consistent with the crystal 
structure of CHIKV (Voss et al, 2010). Additionally, several 
histidines, specifically two completely conserved (H349 and 
H353) and two partially conserved (H358 and H361) 
(Supplementary Figure S4), are located in the previously 
unidentified E2 subdomain D, which consists of a loop and 
a helix (Figure 2D) . The location suggests that upon low pH 
exposure in the endosome, the protonation of these histidines 
may promote their interactions with the neighbouring nega- 
tive charged lipid head groups and anchor E2 to the lipid 
membrane, thus facilitating the separation of E1-E2 hetero- 
dimers along with the El homotrimerization. 

The structure of VEEV TC-83 vaccine strain is different 
from its parental Trinidad donkey strain by five residues 
(K7N, H85Y, T120R, V192D and T296I) in E2 and one residue 
in El (L161I) (Kinney et al, 1993). The E2 T120R mutation 
has been found to be the major structural determinant of 
attenuation (Kinney et al, 1993). In our structure, this residue 
R120 is located at the E2 trimeric interface and at the top 
surface of the spike complex (Figure 3). This location sug- 
gests that the in vitro adaption of residue T120 from neutral to 
positive charged residue may result from binding to the 
negative charged heparan sulphate, which is the putative 
receptor on the surface of cell cultures; while the attenuation 
in vivo is likely a result of less efficient spreading of the 
virus due to binding to some negatively charged molecules 
(Klimstra et al, 1998; Byrnes and Griffin, 2000). 

Our VEEV E2 model also provides the structural basis 
to interpret the previous genetic studies on the host range 
and neutralizing activities of VEEV. An S218N mutation 
is believed to have mediated the recent emergence of VEE 
in southern Mexico by adapting subtype IE enzootic 
strains to more efficiently infect the epidemic vector, 
Aed.es {Ochlewtatus) taeniorhynchus (Brault et al, 2004). 
The G193R and T213R mutations are associated with the 
1992 subtype IC epidemic emergence in Venezuela (Rico- 
Hesse et al, 1995), and experimental equine infections 
demonstrated that the T213R mutation mediates enhanced 
equine viraemia, the critical driver of epidemic VEEV spread 
(Anishchenko et al, 2006) . The close clustering of all of these 
host range mutations on the tip of the spikes (Figure 3) 
provides strong circumstantial evidence that this location is 
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Figure 3 Mapping of specific residues on VEEV El and E2. 
(A) Model of an E1-E2 heterodimer. Two N-linked glycosylation 
sites (E1-N134 and E2-N318) are labelled in green and red, respec- 
tively. The major determinant of virulence attenuation (residue E2- 
T120) is labelled in dark blue. Two sets of residues (E2-193/213 and 
E2-218) whose mutations strongly affects the equine and mosquito 
host range are labelled in grey and black, respectively. The pre- 
viously identified VEEV epitopes for murine monoclonal antibodies 
mMAbs (residues 182-207) and human hMAbs (residues 115-119) 
are coloured in orange and yellow, respectively. (B) Model of an E2 
homotrimer of VEEV in one asymmetric unit. The residue labelling 
is the same as (A). 



directly involved in receptor binding. Interestingly, the VEEV 
epitopes for murine monoclonal antibodies (mMAbs) and 
human hMAbs map to E2 residues 182-207 (Johnson et al, 
1990) and 115-119 (Hunt et al, 2010), respectively, which are 
also located at the tip of the spikes (Figure 3) . 



E3 protein 

In both our original map (Figure 4A) and the asymmetric unit 
averaged map (Figure 4B and C), at a slightly lower isosur- 
face threshold than used to visualize El and E2, we observed 
extra density decorating the outermost portion of E2 above 
subdomains A and B. The location of this density is consis- 
tent with the E3 densities observed in the previous cryo-EM 
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Figure 4 The presence of E3 protein in mature VEEV. (A) The densities for E3 (orange) in one asymmetric unit of the original 3D 
reconstruction. Note the E3 densities are displayed at slightly lower isosurface threshold than El (magenta) and E2 (cyan) densities. Scale 
bar: 2 nm. (B, C) Side and top views of the density for E3 in the asymmetric unit averaged map. The crystal structures of CHIKV pE2 (orange) 
and El (blue) (PDB code: 3N40) are fitted separately as a rigid body into the averaged density. Our models of VEEV El, E2 and E3 are shown in 
magenta, cyan and green, respectively. The blue arrows point to the two rod-like features in the density. (D) SDS-PAGE result of VEEV TC-83 
samples we used for imaging. The leftmost and rightmost lanes are molecular size markers. Lanes 1-4 are the four batches of VEEV TC-83 
samples used for cryo-EM imaging. 



structures of pE2 cleavage-impaired SINV and SFV mutants 
(Paredes et al, 1998; Wu et al, 2008). 

The E3 protein, ~ 60 residues in length, is cleaved from the 
pE2 protein in the Golgi complex to yield mature E2. Given 
the 52% sequence identity between VEEV E3 and CHIKV E3, 
we built a homology model for VEEV E3 (residues 1-59) 
using the CHIKV pE2 crystal structure (containing E3) (Voss 
et al, 2010) as the template and placed it in our asymmetric 
unit averaged map based on the fitting of CHIKV pE2 crystal 
structure (Figure 4B and C). There are two rod-like features 
in our E3 density that match the two a-helices of our E3 
homology model (Figure 4B and C, blue arrows). Due to 
disconnected densities above these helices, presumably 
corresponding to the N-terminus of E3 (Figure 4C), our E3 
homology model was not further refined against the cryo-EM 
density. 

To determine whether E3 is present and cleaved from 
the precursor protein pE2 upon maturation in VEEV, we 
performed SDS-PAGE on the four batches of VEEV TC-83 
samples that we used for cryo-EM imaging. A 4-20% poly- 
acrylamide gel was used to resolve the small 7kDa E3 
protein, and the result confirmed the presence of E3 in the 
mature VEEV, although at lower stoichiometry (Figure 4D). 
Interestingly, a very faint band of pE2 (62 kDa) is seen in two 
of the four sample batches, indicating that in some cases 
the cleavage might not be 100% complete. 
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Based on our structural and biochemical analysis, it 
appears that the aforementioned density indeed represents 
the cleaved form of E3. While E3 has been found in another 
mature alphavirus SFV (Wu et al, 2008), this is the first time 
that E3 has been seen structurally in mature VEEV. It remains 
to be determined why E3 proteins stay associated with E2 in 
VEEV after cleavage. It is conceivable that E3 may function to 
maintain the relative orientation between E2 subdomains A 
and B, so as to protect the El fusion loop from premature 
exposure to the host membranes (Figure 2F) (Li et al, 2010; 
Voss et al, 2010). 

E1 and E2 endodomains 

Beyond the ectodomains, our cryo-EM structure of the entire 
virus provides us with the unique opportunity to explore 
the protein interactions in the El and E2 endodomains. 
In the asymmetric unit averaged map, almost all of the 
bulky side-chain densities along the TM helices remain 
visible after averaging, including E2-W387, E1-W407, 
E1-W409 and E1-Y434 (Figure 5A-C), although some of 
them appear less prominent than the unaveraged densities. 
Using these side-chain densities as the anchor points, we 
can determine the registration of Coc positions for El and 
E2 TM helices and further trace the backbones of the 
entire El and E2 endodomains, including their cytoplasmic 
tails (Figure 5A). Consistent with the secondary structure 
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Secondary structure prediction for E1 C-terminus 

Conf: lllllllllllllllllllllllllDDllllllllllllIiaillllllllllllllll^llllllllllllllliD^IIllE 

Pred: l) L) 

Pred: CCCCEEEEECCCCCEECCCCCCCCCCCCCCCCCCCCCCCHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHCCCCCC 
AA: IHPEFRLQICTSYVTCKGDCHPPKDHIVTHPQYHAQTFTAAVSKTAWTWLTSLLGGSAVIIIIGLVLATIVAMYVLTNQKHN 

370 380 390 400 410 420 430 440 

Secondary structure prediction for E2 C-terminus 

conf: ]|||lD^lllllllllllllllllllllllllllliilli^illlli^llllllliDllllllE 

Pred: ^ \ ^ ) 

Pred: CCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCCCCCCCHHHHHHHHCCCCCCC 
AA: HRYPMSTILGLSICAAIATVSVAASTWLFCRSRVACLTPYRLTPNARIPFC.LAVLCCARTARA 

370 380 390 400 410 420 F 



Figure 5 El and E2 endodomains and their interactions with the CPs. (A) Our models for El (magenta) and E2 (cyan) endodomains and CP 
(dark blue). The homology model parts and de novo model parts are shown as ribbon and stick, respectively. The asymmetric unit averaged 
map is shown in transparent grey. Various features are highlighted: E2 Y-R-L motif (red), El G415/G416 at the kink region (green), E2 C396/ 
C416/C417 near the inner membrane (yellow) and the helix (residues 115-124) of CP (orange). The disordered densities for the lipid bilayer 
and genomic RNA are simplified with transparent orange and green lines, respectively. Scale bar: 1 nm. (B, C) The prominent side-chain 
densities for E1-W407, E1-W409 and E1-Y434 in the averaged density map. (D) Same as (A) with less density transparency showing the E2 C- 
terminal tail and its interaction with the CP pocket. The blue dashed arrow points to the small C-terminal helix of E2 (residues 409-416). (E) 
Different viewing angle (rotation of 70° along the z axis) of (D) showing the density for the previously unidentified helix of CP (pointed by blue 
dashed arrow). (F) Secondary structure prediction for El and E2 C-termini from PSIPRED. 



prediction (Figure 5F), the El TM helix (residues 403-442) 
was modelled as two consecutive helices separated by a kink, 
while the E2 TM helix (residues 367-401) was modelled 
as a long, straight helix. Below the kink and towards the 
nucleocapsid core, the angle between El and E2 TM helices 
is very similar to the characteristic angle associated with 
a leucine zipper (O'Shea et al, 1991). 

In our model, two highly conserved glycines (G415 and 
G416) (Supplementary Figure S5) are located at the El kink 
region (Figure 5A), which is necessary to bring the two TM 
helices into close proximity despite the large size of El and E2 
ectodomains. This conserved GG motif in alphaviruses is 
different from the canonical GXXXG (with X being any 
residue) motif that is commonly found at the interface of 
interacting TM helices, where two glycines are brought to the 
same side of the helix, allowing close contact between 
the two helices (Kleiger et al, 2002). Given the register of 
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E1-W407 and E1-W409, and assuming the helix keeps its 
winding path at the kink region, which is typically the case, 
these two glycines have to face away from the E2 TM helix. 
Therefore, it is unlikely that this GG motif is directly involved 
in the interaction between El, E2 TM helices. Instead, it may 
function to alleviate the steric forces at the inner-bending 
side of the kink. Previous studies have showed that mutations 
of these glycines to leucines in SFV destabilize the E1-E2 
heterodimer and promote the formation of El homotrimer 
during fusion (Sjoberg and Garoff, 2003). Interestingly, this 
kink leaves space for a small globular density situated 
between the upper parts of the El and E2 TM helices 
(Figure 2C, blue dashed arrow). This yet-to-be-assigned 
density is located below the two completely conserved resi- 
dues E2-Y359 and E2-Y360 (Supplementary Figure S4), but is 
significantly larger than that expected for a tyrosine side 
chain. 
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During alphavirus maturation, the glycoproteins embed- 
ded in the plasma membrane use their cytoplasmic tails to 
interact with the nucleocapsid core in order to form an intact 
virus particle and bud out of the cell. The molecular mechan- 
ism of this process is not well understood, partially due to the 
lack of high-resolution structures. Here, our cryo-EM struc- 
ture for the first time clearly resolves the entire cytoplasmic 
tail (C-terminus) of E2 (residues 402-423), which extends 
through the inner membrane, then interacts with the hydro- 
phobic pocket of CP (presumably via a previously reported 
Tyr-X-Leu tripeptide structural motif (Zhao et al, 1994; 
Owen and Kuhn, 1997)), and finally loops back to the inner 
membrane (Figure 5 A and D). A short, rod-like density is 
revealed in this 'hairpin' region (Figure 5D, blue dashed 
arrow), consistent with the predicted a-helix for residues 
409-416 (Figure 5F). In addition, our model of the E2 
C-terminus shows that three completely conserved 
(Supplementary Figure S4), and presumably palmitoylated 
(Gaedigk-Nitschko and Schlesinger, 1991; Ivanova and 
Schlesinger, 1993) cysteines (C396, C416 and C417) are 
located near the lipid head groups of the inner membrane 
(Figure 5A). The interactions between their palmitoylated 
chains and the lipid tails may help anchor the E2 C-terminus 
to the cytoplasmic side of the viral membrane, thus promot- 
ing their interaction with the CPs. Considering that the 
E2 C-terminus originally adopts a membrane-spanning 
conformation in ER during biosynthesis (Liljestrom and 
Garoff, 1991), our observation directly supports the hypo- 
thesis (Ivanova and Schlesinger, 1993) that post-translational 
palmitoylation of cysteines triggers the reorientation of the 
entire E2 C-terminus. 

CP and nucleocapsid core 

For the CP, a crystal structure of the protease domain of VEEV 
TC-83 (residues 120-275) has been reported previously (PDB 
ID: 1EP5). Fitting of this domain into our cryo-EM density 



map (Figure 6A) reveals significant connecting density at 
the intra-capsomere CP/CP interface (Figure 6B), but little 
connecting density at the inter-capsomere CP/CP interface 
(Figure 6C). It is likely that during the initial nucleocapsid 
core assembly process, the pentameric and hexameric cap- 
someres may be formed first and then brought together by 
some external forces rather than the lateral CP/CP inter- 
actions in the protease domains. 

The cryo-EM density of CP also, for the first time, shows 
the predicted a-helix at residues 115-124 (Figure 6B), which 
is missing in the crystal structures (Choi et al, 1991, 1997). 
This helix (Figure 5E, blue dashed arrow) bridges the struc- 
tured C-terminal protease domain and the unstructured, 
highly basic N-terminus, which interacts with genomic RNA 
in the CP/RNA mixture shell (Figure 1C). The sequence of 
this short helix overlaps with the 'linker region' (Wengler, 
2009), a stretch of highly conserved residues (109-125 for 
VEEV, Supplementary Figure S6) that binds to cellular 60S 
ribosomal subunits, which function to disassemble the 
nucleocapsid cores after they are released into the cytosol 
(Wengler et al, 1992). Considering the current location of the 
linker regions in the mature virus (at the inner surface of the 
core), the CPs may undergo some conformational changes 
after its release to the cytosol and expose their linker regions 
to the cellular factors. 

The spatial arrangement of the first ~120 N-terminal 
residues of CP in the mature virus has been a mystery. 
There is a putative helix in this region (helix I, residues 
34-51 for VEEV, Figure 7B) that is believed to form a coiled- 
coil structure between two neighbouring CPs and to stabilize 
the core (Perera et al, 2001). However, when analysing our 
map, we did not observe any clear density corresponding to 
helix I. It is possible that these coiled-coil structures (around 
30 A in length) are located in the low-density shell region 
(from 95 to 130 A in radius) between the CP/RNA mixture 
shell and the central core (Figure 7A) . Their distribution may 




Figure 6 CP/CP interactions in the nucleocapsid core. (A) Our model of the entire nucleocapsid core fitted into the cryo-EM density. The four 
copies of CPs in one asymmetric unit are coloured in red, blue, green and yellow. Scale bar: 5 nm. (B) Zoom in view of the intra-capsomere 
interactions. (C) Zoom in view of the densities around a quasi three-fold axis. 
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Figure 7 Proposed spatial arrangement of the first ~120 residues 
of CP in the nucleocapsid core. (A) In our proposed model, the helix 
I coiled-coil structure (residues 34-51) between two neighbouring 
CPs is located at the low-density shell (radius 95-130 A) between 
the CP/RNA mixture shell and the central core, while the first 
33 hydrophobic residues are located at the central core (radius 
<95A). Scale bar: 10 nm. (B) Secondary structure prediction for 
CP N-terminus from the PSIPRED results. 



not follow icosahedral symmetry; and therefore, their densi- 
ties are averaged out in our reconstruction process. Our 
hypothesis places the first 33 residues of CP (mostly hydro- 
phobic residues, Figure 7B) at the central dense core (radius 
<95A) (Figure 7A), where they would cluster to form a 
'scaffold' and initiate core assembly. Additionally, since only 
10% of the first 50 residues are basic, as opposed to 39% 
basic ones within residues 51-117, it is likely that the majority 
of the genomic RNA is confined to the thin CP/RNA mixture 
shell (Figure ID), as supported by a theoretical analysis 
(Belyi and Muthukumar, 2006) . 

In summary, our 4.4 A resolution cryo-EM density map and 
derived models reveal many important features not seen in 
the crystal structures of domains of alphavirus subunits, 
including the El stem loop, E2 subdomain D, E1/E2 TM 
helices, E2 cytoplasmic C-terminal tail and one additional 
helix of CP. The presence of E3 in the cleaved form is also 
unambiguously identified for the first time in the mature 
VEEV virion. Our structural results suggest a mechanism for 
the initial stage of nucleocapsid core formation, and shed 
light on the virulence attenuation, host recognition and 
neutralizing activities of VEEV as well as other alphavirus 
pathogens. 



Materials and methods 

Virus production and purification 

Baby hamster kidney cells were prepared to 80-90% confluence 
and were inoculated with virus at a multiplicity of 0.1 plaque- 
forming units per cell. Infected cells were incubated at 37°C for 
2 days until cytopathic effects appeared; then the supernatant was 
clarified by centrifuged for 5-10 min at 1000-2000g to remove 
cellular debris. The virus was concentrated by precipitation with 



7% polyethylene glycol 6000 and 2.3 % NaCl at 4°C for >4 h. Then, 
the virus was centrifuged at ^2500g for 30 min and was gently 
resuspended in 1-2 ml TEN buffer (0.05 M Tris-HCl, pH 7.4, 0.1 M 
NaCl and 0.001 M EDTA). The virus suspension was purified by 
centrifugation through a 20-70% sucrose/TEN continuous gradient 
for 60 min at 35 000 g. The virus band was harvested using a plastic 
Pasteur pipette and centrifuged 3 x through Amicon 100 kDa filter 
(Ultra-4 Cat. No. UFC810024), resuspending each time to maximum 
load volume with TEN. The purified virus was harvested in the 
minimal remaining volume after final centrifugation (ca. 50-100 ul). 

SDS-PAGE 

The SDS-PAGE of VEEV TC-83 was performed using Bio-Rad 
Mini-PROTEIN TGX 4-20% polyacrylamide gel (Bio-Rad Inc.) and 
the p7708S molecular size marker (New England Biolabs Inc.). 

Electron cryo-microscopy 

An aliquot of 2.5 ul purified VEEV TC-83 sample was applied to a 
400 mesh Quantifoil R 1.2/1.3 copper grids (hole size 1.2 pm) 
(Quantifoil Inc.) and were rapidly plunge frozen in liquid ethane by 
a FEI Vitrobot. In total, eight grids, including two grids with 
continuous carbon film underneath the samples, were used for 
imaging in 16 cryo-EM sessions. Cryo-EM images were collected in 
a JEM3200FSC electron cryo-microscope (JEOL, Tokyo) operated 
at 300 keV, and at liquid nitrogen specimen temperature. The 
microscope is equipped with a field emission gun (FEG) and an 
in-column omega energy filter (a slit width of 10 eV was used for 
data collection) . Approximately 4100 CCD frames were recorded at 
a detector magnification of x 141110 (1.07 A/pixel sampling rate) 
using a Gatan 4K x 4K CCD camera (model no. 895, Gatan), with a 
defocus range of 0.5-2.5 um. 

Image processing 

We carefully screened all the CCD frames from which 3558 images 
with evident signals up to 1/5 A -1 in their ID power spectra were 
selected for subsequent processing. A total of 37 315 virus particles 
were automatically boxed out using ethan (Kivioja et al, 2000), 
among which ~ 10 000 particles contained continuous carbon film. 
The contrast transfer function parameters for each CCD image were 
manually determined using ctfit in EMAN1 (Ludtke et al, 1999). An 
initial model at ~ 7 A resolution was quickly obtained by MPSA (Liu 
et al, 2007). The structure was further refined by EMAN1 using 
standard projection matching method with progressively decreasing 
angular step size (with a final value of 0.4°). After each iteration, 
the non-icosahedral part, including the lipids and the RNA, in the 
reconstruction was removed by a soft-edged mask, which defines 
the outline of the icosahedrally organized, low-pass filtered 
'protein-only' content in the map. This masked map then served 
as the reference model for the next iteration. The resolution of the 
final reconstruction was estimated to be 4.8 A based on the 0.5 
criterion of the Fourier shell correlation (FSC) between two 
independent reconstructions (Van Heel, 1987; Supplementary 
Figure SI). 

Averaging subunits within an asymmetric unit 

To improve the resolvability in our density map, we computation- 
ally segmented out the densities for the four unique sets of E1-E2- 
E3-CP molecules in one asymmetric unit using Chimera (Goddard 
et al, 2007). The four molecules (El, E2, E3 and CP) in each set 
were treated as an intact unit during the segmentation. We then 
used foldhunterP program (Baker et al, 2007) in EMAN1 to align the 
four pieces of segmented densities and used proc3d in EMAN1 to 
compute the average. To estimate the resolution of our averaged 
map, we applied the same averaging technique to the two 
independent reconstructions that were used to calculate the 4.8-A 
resolution for the original density map, and then calculated the FSC 
between the two resulting averaged maps. 

Model building and refinement 

To model the VEEV El, E2 ectodomains and E3 (El: residues 1-389; 
E2: residues 1-341 and E3: residues 1-59), the sequence alignment 
and subsequent homology modelling was performed by MODEL- 
LER (Eswar et al, 2006), using the crystal structure of its CHIKV 
homologue (Voss et al, 2010) (PDB ID: 3N40) as the template. The 
missing parts in the crystal structure (El: residues 390-442; E2: 
residues 342-423) were modelled de novo by first tracing the 
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backbones using GORGON (Baker et al, 2011), with several visible 
side-chain densities serving as the anchor points. 

To model the TM helices, in particular, we generated the Ca 
models of two consecutive helices (residues 403-412 and 415-442) 
separated by a kink for El, and a long straight helix (residues 367- 
401) for E2. These residue assignments are based on both the 
secondary structure predictions from PSIPRED online server 
(McGuffin et al, 2000; Figure 5F) and our cryo-EM density map. 
The three helical models were placed in the corresponding densities 
in GORGON, and the registration of their Ca positions was 
determined by the evident bulky side-chain densities along the 
helices (e.g., E1-W407, E1-W409, E1-Y434 and E2-W387). 

Next, we converted all the de novo traced Ca models to their 
corresponding all-atom models of VEEV using SABBAC online 
server (Maupetit et al, 2006), and then stitched the homology and 
de novo portions together in COOT (Emsley and Cowtan, 2004) to 
generate the initial full-length El and E2 atomic models. 

Our model for the CP protease domain is taken directly from the 
previous crystal structure of VEEV TC-83 (PDB ID: 1EP5: A, residues 
120-275), and was fitted into the density map as a rigid body using 
CHIMERA'S Fit to Map function. The a-helix of CP (residues 
115-124) was modelled de novo in the same way as the El and E2 
TM helices. 

Finally, we used ROSETTA (DiMaio et al, 2009) to refine the 
full-length El, E2 and the structured part of CP. The E3 homology 
model was not further refined due to the less-resolved quality 
of our E3 density. ROSETTA uses the cryo-EM density as a restraint, 
along with energy minimization to eliminate the steric clashes 
and assure proper molecular geometry. In total, two separate 
refinements were performed. First, one set of E1-E2-CP mole- 
cules was refined against the asymmetric unit averaged map. 
Second, four copies of the models from the first round were 
placed at the T=4 related positions within the asymmetric unit, 
and these four sets of E1-E2-CP molecules were refined together 
against the original cryo-EM density map to produce our final 
model. 



Accession numbers 

Atomic coordinates of our refined model of E1-E2-CP and unrefined 
homology model of E3 within one asymmetric unit have been 
deposited with the Protein Data Bank under the accession codes 
3J0C and 3J0G, respectively. A cubic portion of the original 3D 
density map (covering one asymmetric unit) and the asymmetric 
unit averaged map have been deposited in the Macromolecular 
Structure Database at the European Bioinformatics Institute under 
the accession codes 5275 and 5276, respectively. 



Supplementary data 

Supplementary data are available at The EMBO Journal Online 
(http://www.embojournal.org) . 
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